NetFlix New Zealand vs NetFlix USA Library Comparisons

During June 2015 NZRS logged into Netflix from within New Zealand and the USA and observed the content that was avaiable. From each page the titles offered by the service geographically were extracted and stored.

Each title was compared against the OMDd API (http://www.omdbapi.com/) an alternative interace to the Internet Movide Database (IMDd) data. This allowed each title to be compared against data held within IMDb and the title data to be augmented.

This included the following:

  • Plot
  • Poster
  • Rated
  • Language
  • Title
  • Country
  • Writer
  • Metascore
  • imdbRating
  • Director
  • Released
  • Actors
  • Year
  • Genre
  • Awards
  • Runtime
  • Type
  • Response
  • imdbVotes
  • imdbID

The data was compiled and the serialised outputs saved for further analysis. The data can be found here.[[[link to pickles]]]] This was done using the Python pickle module.

A Python module was created to help with analysis and is available on Github (https://github.com/NZRS/content-analysis/blob/master/content_stats.py).

The analysis focussed on titles, so does not at present identify if the title is a movies or a series. This may be able to be ascertianed from the OMDB 'Type', though this still does not give number of episodes, nor how many episodes are on Netflix. Qualitively Netflix NZ is missing the lasest series of Doctor Who as well as series from the 'classic Who'.


In [21]:
%matplotlib inline
import pickle
import plotly.plotly as py
from plotly.graph_objs import *
# module from NZRS
import content_stats
from IPython.core.display import Image
from urllib2 import quote
from IPython.display import display
import plotly.tools as tls
from IPython.display import HTML
from collections import Counter

In [22]:
# load previously pickled dictionaries

nz_data = pickle.load(open('nz/all_movies_dict.p', 'rb'))
us_data = pickle.load(open('us/all_movies_dict.p', 'rb'))

For all charts we present; the data we use is available for exploration and reuse. There is a 'Play with this data' link at the bottom right hand side of each chart.

If you are running these iPython notebooks yourself please note we are embedding the graphs rather than creating them. You can uncomment the creation code and comment out the embed code if you are using the notebooks interactively.

Library Size

The simplest test we can carry out is looking at library size. We are looking at count of titles not count of discreet episodes or total viewing time. Its not an unusfule test to begin to understand the libraries.


In [23]:
total_titles_nz = len(nz_data)
total_titles_us = len(us_data)

data = (
    [Bar( x = ['NZ', 'USA'],
          y = [total_titles_nz, total_titles_us],
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )]
    )

layout = Layout(
    title ='Netflix Library Comparison- USA vs NZ - June 2015',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'Geographic Service'),
    
    )     

fig = Figure(data=data, layout=layout)

# Run this to generate the plot.ly plot for yourself, once created we will embed it
# py.iplot(fig, filename = 'Netflix-Library-Comparison-June-2015')

# Run this to embed the plot after creation, this is necessary for rendering in Github in particular.

HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/97/" title="Netflix Library Comparison- USA vs NZ - June 2015" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/97.png" alt="Netflix Library Comparison- USA vs NZ - June 2015" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:97" src="https://plot.ly/embed.js" async></script>
</div>

<div>

<a href="https://plot.ly/~gotofftherails/97/">Link to interactive chart and data</a> 

</div>


'''
)




Uniqueness of Content

Further up you would notice we imported a python module called 'content_stats'. This lets us apply some more exploratory stats.

The uniqueness of content between libraries has been questioned by several people.


In [24]:
# Number of titles are common
common = len(content_stats.Compare_regions(us_data, nz_data).common_titles())
print 'Titles in common between USA and NZ:', common


# Number of titles unique to us
unique_us = len(content_stats.Compare_regions(us_data, nz_data).unique_to_first())
print 'Titles Unique to the USA :', unique_us

# Number of titles unique to nz
unique_nz = len(content_stats.Compare_regions(nz_data, us_data).unique_to_first())
print 'Titles Unique to the NZ :', unique_nz


Titles in common between USA and NZ: 468
Titles Unique to the USA : 3860
Titles Unique to the NZ : 966

We can represent this graphically again.


In [25]:
trace1 = Bar(
        y=['Service'],
        x=[unique_nz],
        name='Unique to NZ',
        orientation = 'h',
    
        marker = Marker(
            color = 'rgba(255,127,14,1.0)'        )
    
        )

trace2 = Bar(
        y=['Service'],
        x=[common],
        name='Common',
        orientation = 'h',
            marker = Marker(
            color = 'rgba(44,160,44,1.0)'        )
    
        )


trace3 = Bar(
        y = ['Service'],
        x = [unique_us],
        name = 'Unique to USA',
        orientation = 'h',
            marker = Marker(
            color = 'rgba(39,119,180,1.0)'        )
    
        )


data = Data([trace1, trace2, trace3])
layout = Layout(
    barmode='stack',
    title = 'Content Unique to and Common Between US and NZ Netflix Services - June 2015',
    yaxis = YAxis(title = ''),
    xaxis = XAxis(title = 'Count of Titles')
    )

fig = Figure(data=data, layout=layout)

# used to create
#py.iplot(fig, filename = 'Netflix-Library-Comparison-June-2015-Uniqueness of Content')

# used to embed

HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/127/" target="_blank" title="Unique to NZ, Common Between NZ and USA, Unique to USA" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/127.png" alt="Unique to NZ, Common Between NZ and USA, Unique to USA" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:127" src="https://plot.ly/embed.js" async></script>
</div>

<div>

    <a href="https://plot.ly/~gotofftherails/127/">Link to interactive chart and data</a>

</div>

''')




Quality of Content

Quality by definition is qualitive. Though with enough measure of quality we can hopefully have some quantitive measure of quality, the same way we can hopefully say a five star hotel is normally going to better than a one star hotel. We know this is not always the case and we do have the Napoleon Dynamite effect where the hate and love for a title can be strong.

To assess quality we looked looked towards IMDB, who make some of their data available via alternative interfaces though not in a structured API. Luckily OMDB offer a RESTful API that allows querying by title or ID, and returns XML or JSON.

We used this to query the title against OMDB. Not all returned a useful response, Doctor Who fans will be pleased to know 1995's made for TV movie was not recognised when we queried.

We can look at what we did not get a response for, this is useful, it does not tell us where we got a false response, but we're hoping there were not too many of those.


In [26]:
# Count of titles we did not get a response for 
nz_no_response = len([k for (k, v) in nz_data.iteritems() if v['Response'] == 'False'])
print 'Count of NZ titles we did not get a reponse for: ', nz_no_response
print 'Percentage of NZ sites: ', round(float(nz_no_response)/float(len(nz_data))*100), '%'

#US count?


Count of NZ titles we did not get a reponse for:  163
Percentage of NZ sites:  11.0 %

We're pretty happy with these percentages for getting an understanding of the content that is available from a quality perpective. We might not be able to be absolutely, absolute, but good enough.

We can now look at the difference in quality.

In our assessment of quality we are not looking at Netflix ratings, we are looking at the ratings held in IMDb. It could be that Netflix have content more suited to its customers and they would rate titles higher. Those rating IMDb may be skewed in a particular way as they may have more interest in esoteric aspects of movies and content. Perhaps another useful metric would be Rotten Tomatoes scores, though we don't have complete information on this and we were declied access to the Rotten Tomatoes API.

Average Score


In [27]:
# Average IMDB score of NZ geographic content
nz_avg_score = content_stats.Title_stats(nz_data).average_score()

# Average IMDB score of NZ geographic content
us_avg_score = content_stats.Title_stats(us_data).average_score()

print 'NZ Average Score via OMDB: ', round(nz_avg_score,2)
print 'US Average Score via OMDB: ', round(us_avg_score,2)


NZ Average Score via OMDB:  6.68
US Average Score via OMDB:  6.37

New Zealand may have a smaller catalgoue but it does seem to have a marginally higher average quality than the US. We can look at what the top movies are (based on IMDb/OMDb ratings) between the two countries.

Top Movies

This give an interesting understanding of the top movies. At first look we can se the titles are very different.


In [28]:
top_nz_titles = content_stats.Title_stats(nz_data).top_movies(25)
top_us_titles = content_stats.Title_stats(us_data).top_movies(25)

print 'Top NZ Titles'
print

# Truncate to 5
for tup in top_nz_titles[:5]:
    print 'Title :', tup[0]
    print 'Rating: ', tup[1]
    print '========='

print
print 
print 'Top US Titles'
print
    
# Truncate to 5
for tup in top_us_titles[:5]:
    print 'Title :', tup[0]
    print 'Rating: ', tup[1]
    print '========='
    
print
print

print 'Average score of top 25 Titles'
print
print 'NZ top 25 average'

score = 0
count = 0
for tup in top_nz_titles:
    count += 1
    score += tup[1]
print score/count

print 

print 'US top 25 average'
score = 0
count = 0
for tup in top_us_titles:
    count += 1
    score += tup[1]
print score/count


Top NZ Titles

Title : The Shawshank Redemption
Rating:  9.3
=========
Title : Human Planet
Rating:  9.3
=========
Title : Frozen Planet
Rating:  9.3
=========
Title : Firefly
Rating:  9.2
=========
Title : The Godfather
Rating:  9.2
=========


Top US Titles

Title : Generation Earth
Rating:  9.1
=========
Title : Fullmetal Alchemist: Brotherhood
Rating:  9.1
=========
Title : Long Way Round
Rating:  9.1
=========
Title : Tomb Raider
Rating:  9.1
=========
Title : Top Gear
Rating:  9.0
=========


Average score of top 25 Titles

NZ top 25 average
9.0

US top 25 average
8.916

We could potentially display this visually.

We need to do some trickery as our dictionaries have quoted title names as the keys, so 'The%20Pink%20Panther' instead of 'The Pink Panther', so we need to 'quote' our moviename to get an image.

Top NZ Title Visually


In [29]:
#truncate to top 5
for tup in top_nz_titles[:5]:
    print 'Title :', tup[0]
    print 'Title :', tup[1]
    try:
        poster = Image(nz_data[quote(tup[0])]['Poster'])
        display(poster)
    except:
        print '''
        |
        | No Poster
        |       
        '''


Title : The Shawshank Redemption
Title : 9.3
Title : Human Planet
Title : 9.3
Title : Frozen Planet
Title : 9.3
Title : Firefly
Title : 9.2
Title : The Godfather
Title : 9.2

The Bieber Effect

Across the top 25 both NZ and US are close, with the NZ library taking a slight lead. How do we compare in terms of the bottom ranked content.


In [30]:
bottom_nz_titles = content_stats.Title_stats(nz_data).bottom_movies(25)
bottom_us_titles = content_stats.Title_stats(us_data).bottom_movies(25)

print 'Bottom ranked NZ title', bottom_nz_titles[-1][0], ':' , bottom_nz_titles[-1][1]

poster = Image(nz_data[quote(bottom_nz_titles[-1][0])]['Poster'])
display(poster)

print 'Bottom ranked US title', bottom_us_titles[-1][0], ':' , bottom_us_titles[-1][1]

poster = Image(us_data[quote(bottom_us_titles[-1][0])]['Poster'])
display(poster)


print 
print

print 'Average score of bottom 25 Titles'
print
print 'NZ bottom 25 average'

score = 0
count = 0
for tup in bottom_nz_titles:
    count += 1
    score += tup[1]
print score/count

print 

print 'US bottom 25 average'
score = 0
count = 0
for tup in bottom_us_titles:
    count += 1
    score += tup[1]
print score/count


Bottom ranked NZ title Justin Bieber: Never Say Never : 1.6
Bottom ranked US title Biebermania! : 1.1

Average score of bottom 25 Titles

NZ bottom 25 average
3.104

US bottom 25 average
1.832

It seems NZ might have a smaller library but better quality (based on IMDB rankings). Though on an absolute value this makes some sense. We can look at a distribution to get a better understanding.


In [31]:
hist_nz_titles = content_stats.Title_stats(nz_data).ratings_distribution()
hist_us_titles = content_stats.Title_stats(us_data).ratings_distribution()

In [32]:
trace1 = Histogram(
    x = hist_nz_titles,
    opacity=0.75,
    name = 'Count of NZ Titles'
)
trace2 = Histogram(
    x = hist_us_titles,
    opacity = 0.75,
    name = 'Count of US Titles'
)
data = Data([trace2, trace1])
layout = Layout(
    barmode ='overlay',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'IMdB Score (rounded down)'),
    title = 'Netflix Library Comparison June 2015 Distribution of IMDB scores'
)
fig = Figure(data=data, layout=layout)

# py.iplot(fig, filename = 'Netflix-Library-Comparison-June-2015-Distribution of IMDB scores')

HTML('''

<div>
    <a href="https://plot.ly/~gotofftherails/207/" target="_blank" title="Netflix Library Comparison June 2015 Distribution of IMDB scores" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/207.png" alt="Netflix Library Comparison June 2015 Distribution of IMDB scores" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:207" src="https://plot.ly/embed.js" async></script>
</div>


<div>

    <a href="https://plot.ly/~gotofftherails/207/">Link to interactive chart and data</a>

</div>

''')





In [33]:
titles_score_count_nz = Counter(hist_nz_titles)
titles_score_count_us = Counter(hist_us_titles)


# make relative
nz_tot = sum(titles_score_count_nz.values())
titles_score_count_nz_relative = { k:round((float(v)/float(nz_tot))*100, 1) for (k,v) in titles_score_count_nz.items()}

us_tot = sum(titles_score_count_us.values())
titles_score_count_us_relative = { k:round((float(v)/float(us_tot))*100,1) for (k,v) in titles_score_count_us.items()}

In [34]:
nz_x_list = []
nz_y_list = []

for score, count in titles_score_count_nz_relative.iteritems():
    nz_x_list.append(score)
    nz_y_list.append(count)
    
    
us_x_list = []
us_y_list = []

for score, count in titles_score_count_us_relative.iteritems():
    us_x_list.append(score)
    us_y_list.append(count)
    
    



trace1 = (
    Bar( x = us_x_list,
          y = us_y_list,
        name = 'USA',
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )
    )

trace2 = (
    Bar( x = nz_x_list,
          y = nz_y_list,
        name = 'NZ',
            marker = Marker(
            color = 'rgba(255, 144, 33, 0.6)')
        )
    )

layout = Layout(
    title ='Netflix - IMDb scores - US and NZ June 2015 - Percentage of of Titles',
    yaxis = YAxis(title = 'Percentage of Titles'),
    xaxis = XAxis(title = 'IMDb Score'),
    barmode='group'
    
    )     

data = Data([trace1, trace2])

fig = Figure(data=data, layout=layout)

py.iplot(fig, filename = 'Netflix-Library-Comparison-Release-US-NZ-June-2015-relative')


Out[34]:

It appears the NZ library, while smaller has a greater proportion of higher quality content.

New Zealand as Country of Origin

We can see what New Zealand content is represented within our samples. This includes anything that has the identifier New Zealand in 'Country' via IMDb. This can include co-productions.


In [35]:
# test
nz_origin_nz = content_stats.Title_stats(nz_data).nz_origin()
nz_origin_us = content_stats.Title_stats(nz_data).nz_origin()


data = (
    [Bar( x = ['NZ', 'USA'],
          y = [nz_origin_nz, nz_origin_us],
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )]
    )

layout = Layout(
    title ='Netflix - Count of Titles with Country of Origin as New Zealand via IMDb',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'Geographic Service'),
    
    )     

fig = Figure(data=data, layout=layout)

# py.iplot(fig, filename = 'Netflix-Library-Comparison-Country-June-2015')

HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/130/" target="_blank" title="Netflix - Count of Titles with Country of Origin as New Zealand via IMDb" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/130.png" alt="Netflix - Count of Titles with Country of Origin as New Zealand via IMDb" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:130" src="https://plot.ly/embed.js" async></script>
</div>

<div>
    <a href="https://plot.ly/~gotofftherails/130/">Link to interactive chart and data</a>
</div>

''')




Neck and neck. We can see what these titles are, we can get a feel as to how Kiwi they are.

Age of Titles

We can look at age of titles within the catalogues. There are some interesting caveats to this as age is represented in three ways:

  • A single year - this could be a movie that came out in a single year or a series that only ran within a single calendar year
  • Across multiple years, for example a series that has run across multiple years (e.g. Friends, 1994–2004)
  • Something that is still running (e.g. Orange is the New Black, 2013–)

Initially we could look at year of first release.


In [36]:
release_year = content_stats.Title_stats(nz_data).year_first_release_count()

x_list = []
y_list = []

for year, count in release_year.iteritems():
    x_list.append(year)
    y_list.append(count)

data = (
    [Bar( x = x_list,
          y = y_list,
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )]
    )

layout = Layout(
    title ='Netflix - Year of Titles First Release - NZ June 2015',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'Year'),
    
    )     

fig = Figure(data=data, layout=layout)

#py.iplot(fig, filename = 'Netflix-Library-Comparison-Release-June-2015')

HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/132/" target="_blank" title="Netflix - Year of Titles First Release - NZ June 2015" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/132.png" alt="Netflix - Year of Titles First Release - NZ June 2015" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:132" src="https://plot.ly/embed.js" async></script>
</div>

    <div>
    
        <a href="https://plot.ly/~gotofftherails/132/">Link to interactive chart and data</a>
    
    </div>

''')




We can do the same for the USA catalogue.


In [37]:
release_year = content_stats.Title_stats(us_data).year_first_release_count()

x_list = []
y_list = []

for year, count in release_year.iteritems():
    x_list.append(year)
    y_list.append(count)

data = (
    [Bar( x = x_list,
          y = y_list,
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )]
    )

layout = Layout(
    title ='Netflix - Year of Titles First Release - US June 2015',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'Year'),
    
    )     

fig = Figure(data=data, layout=layout)

# py.iplot(fig, filename = 'Netflix-Library-Comparison-Release-US-June-2015')
HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/142/" target="_blank" title="Netflix - Year of Titles First Release - US June 2015" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/142.png" alt="Netflix - Year of Titles First Release - US June 2015" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:142" src="https://plot.ly/embed.js" async></script>
</div>

<div>
        <a href="https://plot.ly/~gotofftherails/142/">Link to interactive chart and data</a>
        
</div>

''')




But we can look at them side by side. Lets look at absolute first.


In [38]:
release_year_us = content_stats.Title_stats(us_data).year_first_release_count()

us_x_list = []
us_y_list = []

for year, count in release_year_us.iteritems():
    us_x_list.append(year)
    us_y_list.append(count)

release_year_nz = content_stats.Title_stats(nz_data).year_first_release_count()

nz_x_list = []
nz_y_list = []

for year, count in release_year_nz.iteritems():
    nz_x_list.append(year)
    nz_y_list.append(count)

trace1 = (
    Bar( x = us_x_list,
          y = us_y_list,
        name = 'USA',
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )
    )

trace2 = (
    Bar( x = nz_x_list,
          y = nz_y_list,
        name = 'NZ',
            marker = Marker(
            color = 'rgba(255, 144, 33, 0.6)')
        )
    )

layout = Layout(
    title ='Netflix - Year of Titles First Release - US and NZ June 2015',
    yaxis = YAxis(title = 'Count of Titles'),
    xaxis = XAxis(title = 'Year'),
    barmode='group'
    
    )     

data = Data([trace1, trace2])

fig = Figure(data=data, layout=layout)

#py.iplot(fig, filename = 'Netflix-Library-Comparison-Release-US-NZ-June-2015')

HTML('''<div>
    <a href="https://plot.ly/~gotofftherails/157/" target="_blank" title="Netflix - Year of Titles First Release - US and NZ June 2015" style="display: block; text-align: center;"><img src="https://plot.ly/~gotofftherails/157.png" alt="Netflix - Year of Titles First Release - US and NZ June 2015" style="max-width: 100%;"  onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="gotofftherails:157" src="https://plot.ly/embed.js" async></script>
</div>

<div>

    <a href="https://plot.ly/~gotofftherails/157/">Link to interactive chart and data</a>
    
</div>

''')




The smaller New Zealand library makes this a bit tricker to understand which service has newer and older content when put against the larger US library.


In [39]:
total_us_titles = sum(us_y_list)
us_y_list = [(float(count)/(float(total_us_titles)) * 100) for count in us_y_list]

total_nz_titles = sum(nz_y_list)
nz_y_list = [(float(count)/(float(total_nz_titles)) * 100) for count in nz_y_list]

trace1 = (
    Bar( x = us_x_list,
          y = us_y_list,
        name = 'USA',
            marker = Marker(
            color = 'rgba(34, 95, 250, 0.6)')
        )
    )

trace2 = (
    Bar( x = nz_x_list,
          y = nz_y_list,
        name = 'NZ',
            marker = Marker(
            color = 'rgba(255, 144, 33, 0.6)')
        )
    )

layout = Layout(
    title ='Netflix - Year of Titles First Release - US and NZ June 2015 - Percentage',
    yaxis = YAxis(title = 'Percentage of Titles'),
    xaxis = XAxis(title = 'Year'),
    barmode='group'
    
    )     

data = Data([trace1, trace2])

fig = Figure(data=data, layout=layout)

py.iplot(fig, filename = 'Relative- Netflix-Library-Comparison-Release-US-NZ-June-2015')


Out[39]:

Actors


In [40]:
nz_actors_count = content_stats.Title_stats(nz_data).top_actors(21)
us_actors_count = content_stats.Title_stats(us_data).top_actors(21)

In [41]:
print '<table>'
print '    <tr><th>Actor Name</th><th>Count of Titles with Actor Name</th><tr>'

for tup in nz_actors_count[1:]:
    print '    <tr>'
    print '        <td>'
    print '            ', tup[0].lstrip()
    print '        </td>'
    print '    <td>'
    print '              ', tup[1]
    print '        </td>'
    print '    </tr>'
print '</table>' 
    
    
print
print '====================='

print '<table>'
print '    <tr><th>Actor Name</th><th>Count of Titles with Actor Name</th><tr>'
for tup in us_actors_count[1:]:
    print '    <tr>'
    print '        <td>'
    print '            ', tup[0].lstrip()
    print '        </td>'
    print '    <td>'
    print '              ', tup[1]
    print '        </td>'
    print '    </tr>'

print '</table>'


<table>
    <tr><th>Actor Name</th><th>Count of Titles with Actor Name</th><tr>
    <tr>
        <td>
             Adam Sandler
        </td>
    <td>
               10
        </td>
    </tr>
    <tr>
        <td>
             Johnny Depp
        </td>
    <td>
               9
        </td>
    </tr>
    <tr>
        <td>
             Mel Gibson
        </td>
    <td>
               8
        </td>
    </tr>
    <tr>
        <td>
             Christopher Walken
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Sylvester Stallone
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Nicolas Cage
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Mike Myers
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Burt Young
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Morgan Freeman
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Tom Hanks
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Denzel Washington
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Angelina Jolie
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Josh Lucas
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Talia Shire
        </td>
    <td>
               6
        </td>
    </tr>
    <tr>
        <td>
             Dustin Hoffman
        </td>
    <td>
               5
        </td>
    </tr>
    <tr>
        <td>
             Jason Statham
        </td>
    <td>
               5
        </td>
    </tr>
    <tr>
        <td>
             Alec Baldwin
        </td>
    <td>
               5
        </td>
    </tr>
    <tr>
        <td>
             Al Pacino
        </td>
    <td>
               5
        </td>
    </tr>
    <tr>
        <td>
             Justin Bartha
        </td>
    <td>
               5
        </td>
    </tr>
    <tr>
        <td>
             Ashleigh Ball
        </td>
    <td>
               5
        </td>
    </tr>
</table>

=====================
<table>
    <tr><th>Actor Name</th><th>Count of Titles with Actor Name</th><tr>
    <tr>
        <td>
             Samuel L. Jackson
        </td>
    <td>
               12
        </td>
    </tr>
    <tr>
        <td>
             Eddie Murphy
        </td>
    <td>
               10
        </td>
    </tr>
    <tr>
        <td>
             Jeff Bennett
        </td>
    <td>
               10
        </td>
    </tr>
    <tr>
        <td>
             Ewan McGregor
        </td>
    <td>
               9
        </td>
    </tr>
    <tr>
        <td>
             Nicolas Cage
        </td>
    <td>
               9
        </td>
    </tr>
    <tr>
        <td>
             Julianne Moore
        </td>
    <td>
               9
        </td>
    </tr>
    <tr>
        <td>
             Arnold Schwarzenegger
        </td>
    <td>
               9
        </td>
    </tr>
    <tr>
        <td>
             David A.R. White
        </td>
    <td>
               8
        </td>
    </tr>
    <tr>
        <td>
             Nicole Kidman
        </td>
    <td>
               8
        </td>
    </tr>
    <tr>
        <td>
             Laura Bailey
        </td>
    <td>
               8
        </td>
    </tr>
    <tr>
        <td>
             Cam Clarke
        </td>
    <td>
               8
        </td>
    </tr>
    <tr>
        <td>
             Jean-Claude Van Damme
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Christopher Lloyd
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Lucy Liu
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Jim Cummings
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Danny Glover
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Debi Derryberry
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Tommy Lee Jones
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Cary Grant
        </td>
    <td>
               7
        </td>
    </tr>
    <tr>
        <td>
             Sam Neill
        </td>
    <td>
               7
        </td>
    </tr>
</table>

In [44]:
print '<table>'
print '<tr><th></th><th>NZ Service</th><th></th><th>US Service</th><th></th></tr>'
print '<tr><th>Rank</th><th>Actor Name</th><th>Count of Titles with Actor Name</th><th>Actor Name</th><th>Count of Titles with Actor Name</th></tr>'

for x in range(1,len(nz_actors_count)):
    print '<tr>'
    print '<td>'
    print str(x)
    print '</td>'
    print '<td>'
    print nz_actors_count[x][0].lstrip()
    print '</td>'
    print '<td>'
    print nz_actors_count[x][1]
    print '</td>'
    print '<td>'
    print us_actors_count[x][0].lstrip()
    print '</td>'
    print '<td>'
    print us_actors_count[x][1]
    print '</td>'
    print '</tr>'
    
print '</table>'


<table>
<tr><th></th><th>NZ Service</th><th></th><th>US Service</th><th></th></tr>
<tr><th>Rank</th><th>Actor Name</th><th>Count of Titles with Actor Name</th><th>Actor Name</th><th>Count of Titles with Actor Name</th></tr>
<tr>
<td>
1
</td>
<td>
Adam Sandler
</td>
<td>
10
</td>
<td>
Samuel L. Jackson
</td>
<td>
12
</td>
</tr>
<tr>
<td>
2
</td>
<td>
Johnny Depp
</td>
<td>
9
</td>
<td>
Eddie Murphy
</td>
<td>
10
</td>
</tr>
<tr>
<td>
3
</td>
<td>
Mel Gibson
</td>
<td>
8
</td>
<td>
Jeff Bennett
</td>
<td>
10
</td>
</tr>
<tr>
<td>
4
</td>
<td>
Christopher Walken
</td>
<td>
7
</td>
<td>
Ewan McGregor
</td>
<td>
9
</td>
</tr>
<tr>
<td>
5
</td>
<td>
Sylvester Stallone
</td>
<td>
7
</td>
<td>
Nicolas Cage
</td>
<td>
9
</td>
</tr>
<tr>
<td>
6
</td>
<td>
Nicolas Cage
</td>
<td>
7
</td>
<td>
Julianne Moore
</td>
<td>
9
</td>
</tr>
<tr>
<td>
7
</td>
<td>
Mike Myers
</td>
<td>
6
</td>
<td>
Arnold Schwarzenegger
</td>
<td>
9
</td>
</tr>
<tr>
<td>
8
</td>
<td>
Burt Young
</td>
<td>
6
</td>
<td>
David A.R. White
</td>
<td>
8
</td>
</tr>
<tr>
<td>
9
</td>
<td>
Morgan Freeman
</td>
<td>
6
</td>
<td>
Nicole Kidman
</td>
<td>
8
</td>
</tr>
<tr>
<td>
10
</td>
<td>
Tom Hanks
</td>
<td>
6
</td>
<td>
Laura Bailey
</td>
<td>
8
</td>
</tr>
<tr>
<td>
11
</td>
<td>
Denzel Washington
</td>
<td>
6
</td>
<td>
Cam Clarke
</td>
<td>
8
</td>
</tr>
<tr>
<td>
12
</td>
<td>
Angelina Jolie
</td>
<td>
6
</td>
<td>
Jean-Claude Van Damme
</td>
<td>
7
</td>
</tr>
<tr>
<td>
13
</td>
<td>
Josh Lucas
</td>
<td>
6
</td>
<td>
Christopher Lloyd
</td>
<td>
7
</td>
</tr>
<tr>
<td>
14
</td>
<td>
Talia Shire
</td>
<td>
6
</td>
<td>
Lucy Liu
</td>
<td>
7
</td>
</tr>
<tr>
<td>
15
</td>
<td>
Dustin Hoffman
</td>
<td>
5
</td>
<td>
Jim Cummings
</td>
<td>
7
</td>
</tr>
<tr>
<td>
16
</td>
<td>
Jason Statham
</td>
<td>
5
</td>
<td>
Danny Glover
</td>
<td>
7
</td>
</tr>
<tr>
<td>
17
</td>
<td>
Alec Baldwin
</td>
<td>
5
</td>
<td>
Debi Derryberry
</td>
<td>
7
</td>
</tr>
<tr>
<td>
18
</td>
<td>
Al Pacino
</td>
<td>
5
</td>
<td>
Tommy Lee Jones
</td>
<td>
7
</td>
</tr>
<tr>
<td>
19
</td>
<td>
Justin Bartha
</td>
<td>
5
</td>
<td>
Cary Grant
</td>
<td>
7
</td>
</tr>
<tr>
<td>
20
</td>
<td>
Ashleigh Ball
</td>
<td>
5
</td>
<td>
Sam Neill
</td>
<td>
7
</td>
</tr>
</table>

In [45]:
print '<table>'
print '<tr><th>New Zealand Service </th></tr>'
print '<tr><th>Movie Name</th><th>IMDb Score</th></tr>'
nzx = []
for tup in top_nz_titles[:21]:
    print '<tr>'
    print '<td>',tup[0],'</td>','<td>', tup[1],'</td>' 
    print '</tr>'
    nzx.append(tup[0])
    
print '</table>'

print
print 
print '==============='


print '<table>'
print '<tr><th>Movie Name</th><th>IMDb Score</th></tr>'
usx = []
for tup in top_us_titles[:21]:
    print '<tr>'
    print '<td>',tup[0],'</td>','<td>', tup[1],'</td>' 
    print '</tr>'
    usx.append(tup[0])
print '</table>'


<table>
<tr><th>New Zealand Service </th></tr>
<tr><th>Movie Name</th><th>IMDb Score</th></tr>
<tr>
<td> The Shawshank Redemption </td> <td> 9.3 </td>
</tr>
<tr>
<td> Human Planet </td> <td> 9.3 </td>
</tr>
<tr>
<td> Frozen Planet </td> <td> 9.3 </td>
</tr>
<tr>
<td> Firefly </td> <td> 9.2 </td>
</tr>
<tr>
<td> The Godfather </td> <td> 9.2 </td>
</tr>
<tr>
<td> Greg Fleet: Thai Die </td> <td> 9.2 </td>
</tr>
<tr>
<td> Fullmetal Alchemist: Brotherhood </td> <td> 9.1 </td>
</tr>
<tr>
<td> The Godfather: Part II </td> <td> 9.1 </td>
</tr>
<tr>
<td> Arrested Development </td> <td> 9.1 </td>
</tr>
<tr>
<td> Freaks and Geeks </td> <td> 9.0 </td>
</tr>
<tr>
<td> Chef's Table </td> <td> 9.0 </td>
</tr>
<tr>
<td> Top Gear </td> <td> 9.0 </td>
</tr>
<tr>
<td> North & South </td> <td> 9.0 </td>
</tr>
<tr>
<td> Doctor Who </td> <td> 8.9 </td>
</tr>
<tr>
<td> Fight Club </td> <td> 8.9 </td>
</tr>
<tr>
<td> Horrible Histories </td> <td> 8.9 </td>
</tr>
<tr>
<td> Fawlty Towers </td> <td> 8.9 </td>
</tr>
<tr>
<td> The Good, the Bad and the Ugly </td> <td> 8.9 </td>
</tr>
<tr>
<td> The Lord of the Rings: The Return of the King </td> <td> 8.9 </td>
</tr>
<tr>
<td> Forensic Files </td> <td> 8.8 </td>
</tr>
<tr>
<td> The Lord of the Rings: The Two Towers </td> <td> 8.8 </td>
</tr>
</table>


===============
<table>
<tr><th>Movie Name</th><th>IMDb Score</th></tr>
<tr>
<td> Generation Earth </td> <td> 9.1 </td>
</tr>
<tr>
<td> Fullmetal Alchemist: Brotherhood </td> <td> 9.1 </td>
</tr>
<tr>
<td> Long Way Round </td> <td> 9.1 </td>
</tr>
<tr>
<td> Tomb Raider </td> <td> 9.1 </td>
</tr>
<tr>
<td> Top Gear </td> <td> 9.0 </td>
</tr>
<tr>
<td> North & South </td> <td> 9.0 </td>
</tr>
<tr>
<td> Death Note </td> <td> 9.0 </td>
</tr>
<tr>
<td> The Life of Birds </td> <td> 9.0 </td>
</tr>
<tr>
<td> Friends </td> <td> 9.0 </td>
</tr>
<tr>
<td> 24/7 Flyers/Rangers: Road to the NHL Winter Classic </td> <td> 9.0 </td>
</tr>
<tr>
<td> Chef's Table </td> <td> 9.0 </td>
</tr>
<tr>
<td> Dexter </td> <td> 8.9 </td>
</tr>
<tr>
<td> Pulp Fiction </td> <td> 8.9 </td>
</tr>
<tr>
<td> Attack on Titan </td> <td> 8.9 </td>
</tr>
<tr>
<td> Charlie Don't Surf </td> <td> 8.9 </td>
</tr>
<tr>
<td> Slugterra: Slug Fu Showdown </td> <td> 8.8 </td>
</tr>
<tr>
<td> Andaz Apna Apna </td> <td> 8.8 </td>
</tr>
<tr>
<td> Aerial America </td> <td> 8.8 </td>
</tr>
<tr>
<td> The Phantom of the Opera at the Royal Albert Hall </td> <td> 8.8 </td>
</tr>
<tr>
<td> Never Sleep Again: The Elm Street Legacy </td> <td> 8.8 </td>
</tr>
<tr>
<td> It's Always Sunny in Philadelphia </td> <td> 8.8 </td>
</tr>
</table>

In [48]:
print '<table>'
print '<tr><th></th><th>NZ Service</th><th></th><th>US Service</th><th></th></tr>'
print '<tr><th>Rank</th><th>Movie Name</th><th>IMDb Score</th><th>Movie Name</th><th>IMDb Score</th></tr>'

for x in range(0,len(top_nz_titles[:20])):
    print '<tr>'
    print '<td>'
    print str(x+1)
    print '</td>'
    print '<td>'
    print top_nz_titles[x][0].lstrip()
    print '</td>'
    print '<td>'
    print top_nz_titles[x][1]
    print '</td>'
    print '<td>'
    print top_us_titles[x][0].lstrip()
    print '</td>'
    print '<td>'
    print top_us_titles[x][1]
    print '</td>'
    print '</tr>'
    
print '</table>'


<table>
<tr><th></th><th>NZ Service</th><th></th><th>US Service</th><th></th></tr>
<tr><th>Rank</th><th>Movie Name</th><th>IMDb Score</th><th>Movie Name</th><th>IMDb Score</th></tr>
<tr>
<td>
1
</td>
<td>
The Shawshank Redemption
</td>
<td>
9.3
</td>
<td>
Generation Earth
</td>
<td>
9.1
</td>
</tr>
<tr>
<td>
2
</td>
<td>
Human Planet
</td>
<td>
9.3
</td>
<td>
Fullmetal Alchemist: Brotherhood
</td>
<td>
9.1
</td>
</tr>
<tr>
<td>
3
</td>
<td>
Frozen Planet
</td>
<td>
9.3
</td>
<td>
Long Way Round
</td>
<td>
9.1
</td>
</tr>
<tr>
<td>
4
</td>
<td>
Firefly
</td>
<td>
9.2
</td>
<td>
Tomb Raider
</td>
<td>
9.1
</td>
</tr>
<tr>
<td>
5
</td>
<td>
The Godfather
</td>
<td>
9.2
</td>
<td>
Top Gear
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
6
</td>
<td>
Greg Fleet: Thai Die
</td>
<td>
9.2
</td>
<td>
North & South
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
7
</td>
<td>
Fullmetal Alchemist: Brotherhood
</td>
<td>
9.1
</td>
<td>
Death Note
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
8
</td>
<td>
The Godfather: Part II
</td>
<td>
9.1
</td>
<td>
The Life of Birds
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
9
</td>
<td>
Arrested Development
</td>
<td>
9.1
</td>
<td>
Friends
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
10
</td>
<td>
Freaks and Geeks
</td>
<td>
9.0
</td>
<td>
24/7 Flyers/Rangers: Road to the NHL Winter Classic
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
11
</td>
<td>
Chef's Table
</td>
<td>
9.0
</td>
<td>
Chef's Table
</td>
<td>
9.0
</td>
</tr>
<tr>
<td>
12
</td>
<td>
Top Gear
</td>
<td>
9.0
</td>
<td>
Dexter
</td>
<td>
8.9
</td>
</tr>
<tr>
<td>
13
</td>
<td>
North & South
</td>
<td>
9.0
</td>
<td>
Pulp Fiction
</td>
<td>
8.9
</td>
</tr>
<tr>
<td>
14
</td>
<td>
Doctor Who
</td>
<td>
8.9
</td>
<td>
Attack on Titan
</td>
<td>
8.9
</td>
</tr>
<tr>
<td>
15
</td>
<td>
Fight Club
</td>
<td>
8.9
</td>
<td>
Charlie Don't Surf
</td>
<td>
8.9
</td>
</tr>
<tr>
<td>
16
</td>
<td>
Horrible Histories
</td>
<td>
8.9
</td>
<td>
Slugterra: Slug Fu Showdown
</td>
<td>
8.8
</td>
</tr>
<tr>
<td>
17
</td>
<td>
Fawlty Towers
</td>
<td>
8.9
</td>
<td>
Andaz Apna Apna
</td>
<td>
8.8
</td>
</tr>
<tr>
<td>
18
</td>
<td>
The Good, the Bad and the Ugly
</td>
<td>
8.9
</td>
<td>
Aerial America
</td>
<td>
8.8
</td>
</tr>
<tr>
<td>
19
</td>
<td>
The Lord of the Rings: The Return of the King
</td>
<td>
8.9
</td>
<td>
The Phantom of the Opera at the Royal Albert Hall
</td>
<td>
8.8
</td>
</tr>
<tr>
<td>
20
</td>
<td>
Forensic Files
</td>
<td>
8.8
</td>
<td>
Never Sleep Again: The Elm Street Legacy
</td>
<td>
8.8
</td>
</tr>
</table>

In [49]:
my_set = (set(usx)).intersection(set(nzx))

for title in my_set:
    print '* ', title


*  Fullmetal Alchemist: Brotherhood
*  Chef's Table
*  Top Gear
*  North & South

In [ ]:


In [ ]: